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ABSTRACT 


As an important platform for innovation, the Chinese High-tech Zone undertakes the important responsibility of using 
innovation to drive industry and economic progress. Starting from the multiple stages of innovation in the high-tech zone, 
we will examine the impact of innovation resources input, output of achievement and transformation of results on 
innovation performance. Relying on the cross-section data of 115 national key parks, we will select indicators for 
innovation performance analysis from the perspective of input, output, and transformation. The discrete method based on 
improved K-Means is used to discretize the data, and the dominance rough set is introduced to reduce the information 
system and explore the relationship between the various links of the innovation of the high-tech zone and the final 
innovation performance. The results show that the output and transformation stages are the main stages affecting the 
innovation performance of the high-tech zone. The R&D results and the level of conversion services invested in the 
transformation phase are important factors influencing the innovation performance of the high-tech zone. If the results and 
service levels of the high-tech zones are low, their innovation performance is generally at a relatively backward level in 


the country otherwise, the level of innovation performance is higher. 
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INTRODUCTION 


As the first driving force for development, innovation is the source of promoting national development. As an important 
platform for carrying out innovation activities and promoting innovation development, high-tech zones have become the 
focus of attention from all walks of life. China's high-tech zones have many factors that influence innovation development, 
and have produced a series of considerable innovation results. Based on 2017 statistical data, in terms of the input of 
innovation resources, the high-tech zone has pooled more than 30% of China's R & D investment and gathered more than 
50% of high-tech companies "!, From the perspective of innovation achievements, the number of authorized invention 
patents obtained by enterprises in high-tech zones in 2017 accounted for 46.3% of the total number of authorized invention 
patents in China, and the number of domestic effective invention patents owned by employees in high-tech zones reached 
more than 8 times the national average '!. In summary, the Chinese high-tech zones have become the main force for 


improving the level of innovation. 


As a main force of national innovation, high-tech zones have great advantages in realizing industry and economic 


development through innovation. High-tech zone’s innovation is a dynamic system process, which mainly includes two 


stages of technology research and development and achievement transformation, and it finally outputs products with 
economic value. The ultimate goal of innovation in high-tech zones is to obtain rich innovation results and to promote the 
continuous improvement of the level of industrial economy, which is an important dimension to measure the final 
innovation performance of high-tech zones. Therefore, identifying the factors that affect innovation performance from the 
two major stages of innovation development in high-tech zones, screening important indicators related to innovation 
performance in high-tech zones, and obtaining potential rules for innovation performance in high-tech zones can improve 


the innovation performance of high-tech zones. 


There are various methods of performance analysis, including analytic hierarchy process, data envelopment 
analysis, factor analysis and so on. Dominance rough set has the advantages of knowledge induction, attribute reduction, 
rule extraction, and the results easy to understand. Compared with classic rough set theory, dominance rough sets can more 
effectively deal with decision analysis problems with preference attribute information, so the theory applied to the 
innovation performance analysis of high-tech zones is reasonable and applicable. In addition, the discretization results 
obtained by using the K-Means-based discretization method have high classification quality and approximate accuracy [3], 
so this paper uses it as the optimal data discretization method. Based on the above analysis, this article comprehensively 
considers the various stages of innovation development in high-tech zones, selects performance analysis indicators from 
multiple perspectives on innovation resource input, outcome output, and achievement transformation, and uses the 
advantage rough set to screen the indicators, and excavates important influence factors, Knowledge rules between these 


factors and innovation performance. 


The first part of this article explains the important role of innovation in high-tech zones and the applicability of 
dominance rough set theory in innovation performance analysis. The second part gives the specific research methods. The 
third part explores the process of innovation in China's high-tech zones, uses relevant park data, combines K-Means and 
dominance rough set to mine knowledge rules for innovation performance in high-tech zones, and analyzes the results. 


Finally, conclusions are given. 
RESEARCH METHOD 


This article uses the dominance rough set method to evaluate and analyze the innovation performance of Chinese high-tech 
zones. The dominance rough set method can naturally and effectively maintain the partial order relationship of each 
attribute value, and it is an important method for analyzing information systems with preference order. Discretization of 
attribute values is a prerequisite for using rough set. Therefore, this paper first used data discretization method based on 
improved K-Means to discretize continuous data in the information system, and then used dominance rough set to perform 
knowledge mining on the information system. In the end, objective and reasonable performance analysis results are 


obtained. 
Data Discretization Method based on Improved K-Means 


Discretization of continuous attribute data is an important part of performance evaluation analysis using rough set method, 
and its effect will directly affect the subsequent evaluation analysis results [4]. K-Means algorithm (KCM) [5] is an 
unsupervised clustering algorithm that aggregates data objects with high similarity in the same class, thereby achieving the 


effect of discretizing continuous data. 


The steps for discretizing continuous attribute data using the classic K-Means algorithm [6-7] are as follows: 


Firstly, selecting a data object from the original data object set as the initial cluster center. Secondly, calculating 
the Euclidean distance between each data object and the cluster center, and assign each data object to the nearest category 
according to the calculation result. Then, updating the cluster center. Finally, repeating the above process until the 


evaluation function converges. 


The classical K-Means clustering method has the problem that the cluster center and the number of clusters cannot 
be accurately determined. Therefore, cluster indicators are introduced. The cluster index is very sensitive to the number of 
clusters. When the cluster number is greater than or equal to the optimal number of clusters, the cluster index value will 
slowly decrease. When the value of the number of clusters is less than or equal to the optimal number of clusters, the value 
of the cluster index will drop sharply [8]. The average value of the average centroid distance of the clusters is selected as a 
cluster index for judging whether the number of clusters is appropriate, and the calculation formula is shown in formula 


(D191. 
E=(2Ki E(x, e) (1) 


Where, k is the number of clusters, n; is the total number of data objects in the i-th classC;, d(-)is the Euclidean 


distance function, e; is the cluster center of C;. 


The continuous attribute data discretization method based on improved K-Means first divides the continuous 
attribute data object into multiple clusters through the K-Means clustering method, and secondly introduces the cluster 
index to determine the optimal classification number. Finally, the class cluster label of each class is extracted and used to 


replace all continuous data in the cluster to achieve discretization. 
Dominance Rough Set Theory 


Rough set theory was pioneered by Polish scientist Z. Pawlak in 1982. As a mathematical analysis tool that can be 
effectively applied to uncertain knowledge expression systems, its main idea is to reduce knowledge and mine system 
classification rules on the premise of ensuring that the classification level of the knowledge expression system is 
unchanged. In the application of classic rough set theory, if there is a preference attribute in the knowledge expression 
system, it will lead to inconsistent decision-making in typical cases. Therefore, Greco and other scholars proposed the 
dominance rough set theory, which improved the deficiencies of the classic rough set by replacing the indistinguishable 


relationship with the dominating relationship [10]. 


The Basic Theoretical Method of Dominance Rough Set is as follows 


Knowledge Expression System and Decision Table 


If the knowledge expression system S = (U,A, V,f), then U is a non-empty finite object set, also known as the universe 
object space;A is a non-empty finite attribute set, A= CUD,C ND = ®, where C is the conditional attribute set and D is 


the decision attribute set, and S is also called a decision table at this time; V = v V, , Where a€ A, V, is the range of 
aé 


attribute a’s values; f: U x A > Vis an information function, forvx € U , Wa € A, there is f(x,a) € V, © V [11]. 


Dominating Relationship and Dominating Set 


Let P&C, x,y € U, if Vq € P, f(y, q) = f(x, q) are all true, then y is better than x on the attribute P, denoted as yDpx, yDpx 


is the dominant relationship. 

Given P € C and x,y € U, define the P-dominating sets of x as[12]: 

D +p (x) = {yDpx} (2) 
Dominant Rough Approximation 


According to the value of the decision attribute D, the universe object space can be divided: U/D = Cl = {Cl,t = 


1,2,++-n}. Where Cl, is the t-th equivalent class, and Cl, > ++: > Cl, > ++: > Cl,, if Cl, is combined up or down, then: 

Cl >= UsstCls, Cl St= User Cl, ts € {1,2,---,n} (3) 
The lower and upper approximations of Cl? are recorded as: 

aprp(Clf) =U {x € U: Dp(x) & Clé} (4) 

apr (Cl?) =U {x € U:Dp(x) NC? # ¢} (5) 
Similarly, the lower and upper approximations of Clf are: 

aprp(Clé) =U {x € U: Dp (x) © Cif} (6) 

aprp(Cl=) =U {x € U: Dp (x) NClé # 6} (7) 
Classification Quality and Attribute Reduction 


The ratio of the number of correctly classified objects to the total number of objects in the knowledge expression system is 


called classification quality. The classification quality calculation formula of Clis: 


|U-((Ubnd(Cl?))u(Ubnd (Cl§)))| 
|U| 


y p(Cl) = (8) 


The smallest subset P € C that satisfies y p(Cl) = y <(Cl) is called a reduction of C with respect to Cl, and is 


denoted as RED)(P). 

Preference Decision Rules 

After obtaining the dominant rough approximation, the following preference decision rules can be derived: 
If f(%, qi) 2 rg, Af(% G2) 2 Tq, A f(% dp) 2 rq, thenx € Ci?; 
If f(%& qi) Sq, Af(% 2) S Tq, AFH dp) S Tq,thenx € CIF. 


Where, (41,42,°**) dp) SC, (Tq, To» Tq) € Va, X Va, X 7° X Vg. bE (1,2,-++,n). 


ANALYSIS OF INNOVATION PERFORMANCE OF CHINESE HIGH-TECH ZONES 


Indicator Selection and Data Source 


As a typical institution of cross-organizational cooperation integrating enterprises, scientific research institutions and intermediary 
service platforms, the high-tech zone includes a number of activity links in its innovation process. The specific process is: 
resource input —> technology research and development — achievement output — achievement transformation — production 
application — the company's operating income or industrial economic level has improved. In the research of innovation 
performance of high-tech zones, if we analyze from the traditional input and output perspectives, it is easy to ignore the multi- 
stage nature of high-tech zones’ innovation activities and the link relationship between various stages. This article 
comprehensively analyzes the innovation performance of high-tech zones from multiple dimensions of resource input, output and 


transformation of results, and further explores the relationship between innovation factors and performance at each stage. 


Innovation in the high-tech zone mainly includes the technological research and development stage and the 
achievement transformation stage. During the technological research and development stage, the high-tech zone invests in 
scientific research personnel and funds to produce scientific and technological innovation achievements. Then invest in the 
scientific and technological innovation services related to the transformation of results, and finally obtain the innovation 
performance represented by the operating income of high-tech industries. Starting from the relationship between the two 
major stages of innovation in high-tech zones, it is not difficult to find that the results of scientific and technological 
innovation can only be used as intermediate output, and its final performance needs to be measured by the relevant 
industrial economic level. Therefore, this article uses the high-tech industry's operating income as the decision attribute 
index of rough set analysis, and takes the resource input, outcome output and transformation factors involved in the 


innovation process as conditional attribute indicators. The specific attribute indicator settings are shown in Table 1. 


Table 1: Attribute Indicators of High-Tech Zones’ Innovation Performance Analysis 












































Indicator 2 cumates : Pa | 
; Indicator Dimension Indicator Name Unit Code 
Properties 
R & D personnel full-time equivalent Man-year C, 
Investment in scientific and Ten 
technological innovation R & D internal expenditure thousand C, 
yuan 
Formation of national or industry standards 
iti ae . and participation in the development of Pieces C 
Conditional Scientific and technological | . Pee P 
attribute : : : international standards 
a innovation achievements Ts : 
indicator Number of significant intellectual property : 
: Pieces Cy 
rights 
Number of innovation service agencies Each Cs 
Scientific and technological | Number of companies in technology incubators ak Cc 
innovation service and accelerators . 
High-tech service industry employees people Cy 
Decision Batts : Ten 
5 Scientific and technological ; i svi 
attribute ; : High-tech industry operating income thousand D 
ee innovation performance 
indicator yuan 








The research data comes from the "National Key Park Innovation Test Report" [13] recently released by the 
Ministry of Science and Technology of the People's Republic of China. From this report, relevant statistical data of 115 
national key high-tech zones are selected to form an information system for the innovation performance analysis of 


national high-tech zones. 


Data Discrete and Construction of Multi-Criteria Decision Table based on Improved K-Means 


Because the numerical value of each attribute index in the information system is large (that is, the difference between the 
maximum and minimum values of the attribute index is large), it will affect the effect of K-Means classification. 
Standardized processing to eliminate the influence of the size and value of the attribute index itself. The raw data is 


processed using the standardization method, and the calculation formula is as follows: 


he max —x (9) 


max-min 





Where, max is the maximum value of the attribute index, min is the minimum value of the attribute index, and 


x is the original data. 


Then, according to the data discretization method based on the improved K-Means proposed in the previous 


chapter, the standardized data is discretized to obtain the multi-standard decision table shown in Table 2. 


Table 2: Multi-Criteria Decision Table for High-Tech Zones’ Innovation Performance Analysis 














High-tech Zone (ice Ce RCs ic; Ce ees Rea ea 
Beijing Zhongguancun High | High High High High High High | High 
Tianjin awe: leaned Second Second Second Second cae Second 

High High High High High 
ene Feeduenen end Low | Low Low Low Low Low Low | Low 
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Attribute Reduction and Rule Generation Based on Dominance Rough Set 


As can be seen from Table 2, both the condition attribute and the decision attribute in the decision tables have preference 
information. According to the decision attribute, four preference order classes can be obtained: Cl, = {Low}, Cl, = 


{Second Low}, Cl, = {Second High}, Cl, = {High}. And then the following decision classes are obtained: 
e CIF = Ch, indicating that the innovation performance of high-tech zones is low; 
e §6C1§ = Cl, U Ch, indicating that the innovation performance of high-tech zones is at most second low; 
e C5 = Cl, UCls U Cly, indicating that the innovation performance of high-tech zones is at least second low; 
e §=6C1§ = Cl, UCl, U Cls, indicating that the innovation performance of high-tech zones is at most second high; 
e §6Cl§ = Cl, U Cly, indicating that the innovation performance of high-tech zones is at least second high; 
e Cl = Cly, indicating that the innovation performance of high-tech zones is high. 


115 analysis objects were included in the training set and testing set. Based on the statistical data of a large 
sample, the amount of data must not be less than 30, so the first 85 analysis objects are included in the training set, and the 


testing set contains the last 30 analysis objects. 


The most popular algorithm for dominance rough set is DOMLEM, which can be implemented using software 4e 
Mka2. First enter the training set data in 4e Mka2 to determine the dominant relationship; then search the reduction and the 


kernel of the reduction for the training set; and finally derive the preference decision rules based on the reduction. 


A total of three reductions in the training set were searched, which are {C4, Cs, C7}, {C3, C4, Cg, C7}, {Cz, Cz, Cy, C7}.The core 
of the reduction is {C,, C7}. According to the check of the reduction, it can be seen that the number of important intellectual 
property rights and employees of high-tech service industries are important factors affecting the innovation performance of 
high-tech zones, that is, the investment in the transformation stage is the key to determine the innovation performance of 
high-tech zones, and the impact of investment in the technology research and development stage on innovation 


performance in high-tech zones is relatively small. 
The preference decision rule sets derived from the reduction {C,, C;,C7} are shown in Table 3 and Table 4. 


Table 3: D.Preference Decision Rule Set 


No. Preference Decision Rule Support 
If number of significant intellectual property rights is low and number of innovation service 
1 agencies is at most second low and number of employees in high-tech service industry is at most 65 
second low, then the innovation performance of high-tech zones is low. 

If number of significant intellectual property rights is at most second low and number of 
innovation service agencies is at most second high and number of employees in high-tech service 12 
industry is at most second low, then the innovation performance of high-tech zones is at most 
second low. 

If number of significant intellectual property rights is at most second high and number of 
innovation service agencies is at most second high and number of employees in high-tech service 2 
industry is at most second high, then the innovation performance of high-tech zones is at most 
second high. 

If number of significant intellectual property rights is high and number of innovation service 
4 agencies is high and number of employees in high-tech service industry is high, then the 1 
innovation performance of high-tech zones is high. 
































Table 4: D, Preference Decision Rule Set 
No. Preference decision rule | Support 
If number of significant intellectual property rights is low and number of innovation service agencies 
1 | is at most second low and number of employees in high-tech service industry is at most second low, 65 
then the innovation performance of high-tech zones is low. 
If number of significant intellectual property rights is at least second low, then the innovation 9 
performance of high-tech zones is at least second low. 
If number of significant intellectual property rights is at least second high and number of employees 
3 | in high-tech service industry is at least second low, then the innovation performance of high-tech 2 
zones is at least second high. 
If number of significant intellectual property rights is high and number of innovation service 
4 | agencies is high and number of employees in high-tech service industry is high, then the innovation 1 
performance of high-tech zones is high. 
































It can be known from Table 3 or Table 4 that the derived preference decision rules have correctly classified most 
of the high-tech zones in the training set, the classification quality is higher than 90%, and the classification accuracy is 


extremely high. Reading the rule set, we can find: 


e The level of innovation performance of national high-tech zones is in a state where very few key parks (such as 
Beijing Zhongguancun) are far ahead of most parks. 

e The output level of important intellectual property rights is a key factor that determines whether innovation 
performance can break second lower level. 

e In the stage of transformation of innovation results, if the investment results and service of high-tech zones are low, the 


innovation performance is generally at a relatively backward level; otherwise, the innovation performance level is higher. 


Applying 30 analysis objects in the testing set for rule matching, the results show that 29 objects completely 
match the decision rule set, that is, the information contained in the testing set basically matches the rule sets shown in 
Table 3 and Table 4. Therefore, the rule set mines and displays most of the knowledge in the information system, and can 
reasonably classify and evaluate the overall innovation performance of high-tech zones. In addition, the consistency of the 
analysis results between the training set and the testing set indicates that the dominance rough set theory is universal and 


effective in the performance analysis of high-tech zones. 
CONCLUSIONS 


Because the innovation performance analysis of high-tech zones is relatively complicated, there are various factors 
affecting innovation performance, and there may be mutual constraints or dependencies between these factors. It is 
necessary to select relevant performance analysis indicators as comprehensively and reasonably as possible. Therefore, on 
the basis of considering the multi-stage characteristics of innovation in high-tech zones, this paper selects relevant 
indicators from the dimensions of innovation resource input, output and transformation for the analysis of innovation 
performance in high-tech zones. In addition, this article applies the dominance rough set theory to the analysis of the 
innovation performance of national high-tech zones, making full use of the information of the data itself, and obtaining a 
more objective set of decision rules. For the evaluation analysis problem with preference information, such as innovation 
performance analysis, the dominance rough set theory considers the impact of preference information on the knowledge 


system, which is not only close to the objective reality, but also simplifies the complexity of the rules. 


Because the K-Means discretization method cannot obtain the numerical intervals of various clusters, this makes 
the rules of dominance rough set generation not specific to the interval values. In addition, this article treats the high-tech 
industry operating income as the ultimate innovation performance of high-tech zones, and there are still some innovation 


performances to be studied, such as the number and output value of new products. 
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